29 research outputs found

    Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media

    Get PDF
    With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM), 2017 IEEE/ACM International Conferenc

    Knowledge-Enabled Entity Extraction

    Get PDF
    Information Extraction (IE) techniques are developed to extract entities, relationships, and other detailed information from unstructured text. The majority of the methods in the literature focus on designing supervised machine learning techniques, which are not very practical due to the high cost of obtaining annotations and the difficulty in creating high quality (in terms of reliability and coverage) gold standard. Therefore, semi-supervised and distantly-supervised techniques are getting more traction lately to overcome some of the challenges, such as bootstrapping the learning quickly. This dissertation focuses on information extraction, and in particular entities, i.e., Named Entity Recognition (NER), from multiple domains, including social media and other grammatical texts such as news and medical documents. This work explores the ways for lowering the cost of building NER pipelines with the help of available knowledge without compromising the quality of extraction and simultaneously taking into consideration feasibility and other concerns such as user-experience. I present a type of distantly supervised (dictionary-based), supervised (with reduced cost using entity set expansion and active learning), and minimally-supervised NER approaches. In addition, I discuss the various aspects of the knowledge-enabled NER approaches and how and why they are a better fit for today\u27s real-world NER pipelines in dealing with and partially overcoming the above-mentioned difficulties. I present two dictionary-based NER approaches. The first technique extracts location mentions from text streams, which proved very effective for stream processing with competitive performance in comparison with ten other techniques. The second is a generic NER approach that scales to multiple domains and is minimally supervised with a human-in-the-loop for online feedback. The two techniques augment and filter the dictionaries to compensate for their incompleteness (due to lexical variation between dictionary records and mentions in the text) and for eliminating the noise and spurious content in them. The third technique I present is a supervised approach but with a reduced cost in terms of the number of labeled samples and the complexity of annotating. The cost reduction was achieved with the help of a human-in-the-loop and smart instance samplers implemented using entity set expansion and active learning. The use of knowledge, the monitoring of NER models\u27 accuracy, and the full exploitation of inputs from the human-in-the-loop was the key to overcoming the practical and technical challenges. I make the data and code for the approaches presented in this dissertation publicly available

    Knowledge-Enabled Entity Extraction

    Get PDF
    Information Extraction (IE) techniques are developed to extract entities, relationships, and other detailed information from unstructured text. The majority of the methods in the literature focus on designing supervised machine learning techniques, which are not very practical due to the high cost of obtaining annotations and the difficulty in creating high quality (in terms of reliability and coverage) gold standard. Therefore, semi-supervised and distantly-supervised techniques are getting more traction lately to overcome some of the challenges, such as bootstrapping the learning quickly. This dissertation focuses on information extraction, and in particular entities, i.e., Named Entity Recognition (NER), from multiple domains, including social media and other grammatical texts such as news and medical documents. This work explores the ways for lowering the cost of building NER pipelines with the help of available knowledge without compromising the quality of extraction and simultaneously taking into consideration feasibility and other concerns such as user-experience. I present a type of distantly supervised (dictionary-based), supervised (with reduced cost using entity set expansion and active learning), and minimally-supervised NER approaches. In addition, I discuss the various aspects of the knowledge-enabled NER approaches and how and why they are a better fit for today\u27s real-world NER pipelines in dealing with and partially overcoming the above-mentioned difficulties. I present two dictionary-based NER approaches. The first technique extracts location mentions from text streams, which proved very effective for stream processing with competitive performance in comparison with ten other techniques. The second is a generic NER approach that scales to multiple domains and is minimally supervised with a human-in-the-loop for online feedback. The two techniques augment and filter the dictionaries to compensate for their incompleteness (due to lexical variation between dictionary records and mentions in the text) and for eliminating the noise and spurious content in them. The third technique I present is a supervised approach but with a reduced cost in terms of the number of labeled samples and the complexity of annotating. The cost reduction was achieved with the help of a human-in-the-loop and smart instance samplers implemented using entity set expansion and active learning. The use of knowledge, the monitoring of NER models\u27 accuracy, and the full exploitation of inputs from the human-in-the-loop was the key to overcoming the practical and technical challenges. I make the data and code for the approaches presented in this dissertation publicly available

    D-Record: Disaster Response and Relief Coordination Pipeline

    No full text
    We employ multi-modal data (i.e., unstructured text, gazetteers, and imagery) for location-centric demand/request matching in the context of disaster relief. After classifying the Need expressed in a tweet (the WHAT), we leverage OpenStreetMap to geolocate that Need on a computationally accessible map of the local terrain (the WHERE) populated with location features such as hospitals and housing. Further, our novel use of flood mapping based on satellite images of the affected area supports the elimination of candidate resources that are not accessible by road transportation. The resulting map-based visualization combines disaster-related tweets, imagery and pre-existing knowledge-base resources (gazetteers) to reduce decision-making latency and enhance resiliency by assisting individual decision-makers and first responders for relief effort coordination

    D-record: Disaster Response and Relief Coordination Pipeline

    No full text
    We employ multi-modal data (i.e., unstructured text, gazetteers, and imagery) for location-centric demand/request matching in the context of disaster relief. After classifying the Need expressed in a tweet (the WHAT), we leverage OpenStreetMap to geolocate that Need on a computationally accessible map of the local terrain (the WHERE) populated with location features such as hospitals and housing. Further, our novel use of flood mapping based on satellite images of the affected area supports the elimination of candidate resources that are not accessible by road transportation. The resulting map-based visualization combines disaster-related tweets, imagery and pre-existing knowledge-base resources (gazetteers) to reduce decision-making latency and enhance resiliency by assisting individual decision-makers and first responders for relief effort coordination
    corecore